AITopics | observable reinforcement learning

Learning Reward Machines for Partially Observable Reinforcement Learning

Neural Information Processing SystemsDec-25-2025, 09:31:48 GMT

Reward Machines (RMs), originally proposed for specifying problems in Reinforcement Learning (RL), provide a structured, automata-based representation of a reward function that allows an agent to decompose problems into subproblems that can be efficiently learned using off-policy learning. Here we show that RMs can be learned from experience, instead of being specified by the user, and that the resulting problem decomposition can be used to effectively solve partially observable RL problems. We pose the task of learning RMs as a discrete optimization problem where the objective is to find an RM that decomposes the problem into a set of subproblems such that the combination of their optimal memoryless policies is an optimal policy for the original problem. We show the effectiveness of this approach on three partially observable domains, where it significantly outperforms A3C, PPO, and ACER, and discuss its advantages, limitations, and broader potential.

learning reward machine, observable reinforcement learning, reinforcement learning, (3 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.98)

Add feedback

Provable Partially Observable Reinforcement Learning with Privileged Information

Neural Information Processing SystemsMay-27-2025, 05:33:21 GMT

Partial observability of the underlying states generally presents significant challenges for reinforcement learning (RL). In practice, certain privileged information, e.g., the access to states from simulators, has been exploited in training and achieved prominent empirical successes. To better understand the benefits of privileged information, we revisit and examine several simple and practically used paradigms in this setting, with both computation and sample efficiency analyses. Specifically, we first formalize the empirical paradigm of expert distillation (also known as teacher-student learning), demonstrating its pitfall in finding near-optimal policies. We then identify a condition of the partially observable environment, the deterministic filter condition, under which expert distillation achieves sample and computational complexities that are both polynomial.

computational complexity, observable reinforcement learning, paradigm, (6 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.38)

Add feedback

Partially Observable Reinforcement Learning with Memory Traces

Eberhard, Onno, Muehlebach, Michael, Vernade, Claire

arXiv.org Artificial IntelligenceMar-19-2025

Partially observable environments present a considerable computational challenge in reinforcement learning due to the need to consider long histories. Learning with a finite window of observations quickly becomes intractable as the window length grows. In this work, we introduce memory traces. Inspired by eligibility traces, these are compact representations of the history of observations in the form of exponential moving averages. We prove sample complexity bounds for the problem of offline on-policy evaluation that quantify the value errors achieved with memory traces for the class of Lipschitz continuous value estimates. We establish a close connection to the window approach, and demonstrate that, in certain environments, learning with memory traces is significantly more sample efficient. Finally, we underline the effectiveness of memory traces empirically in online reinforcement learning experiments for both value prediction and control.

machine learning, memory trace, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2503.152

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Reviews: Learning Reward Machines for Partially Observable Reinforcement Learning

Neural Information Processing SystemsJan-23-2025, 17:59:19 GMT

The authors propose a novel approach for solving POMDPs by simultaneously learning and solving reward machines. The method relies on building a finite state machine which properly predicts possible observations and rewards. The authors demonstrate that their method outperforms baselines in three different partially observable gridworlds. Overall, I found the paper clear and well motivated. Learning to solve POMDPs is a very challenging problem and any progress or insight has the potential to have a big impact.

acceptance, learning reward machine, observable reinforcement learning, (4 more...)

Neural Information Processing Systems

Genre: Research Report (0.58)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.40)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.34)

Add feedback

Learning Reward Machines for Partially Observable Reinforcement Learning

Neural Information Processing SystemsOct-10-2024, 01:12:15 GMT

Reward Machines (RMs), originally proposed for specifying problems in Reinforcement Learning (RL), provide a structured, automata-based representation of a reward function that allows an agent to decompose problems into subproblems that can be efficiently learned using off-policy learning. Here we show that RMs can be learned from experience, instead of being specified by the user, and that the resulting problem decomposition can be used to effectively solve partially observable RL problems. We pose the task of learning RMs as a discrete optimization problem where the objective is to find an RM that decomposes the problem into a set of subproblems such that the combination of their optimal memoryless policies is an optimal policy for the original problem. We show the effectiveness of this approach on three partially observable domains, where it significantly outperforms A3C, PPO, and ACER, and discuss its advantages, limitations, and broader potential.

learning reward machine, observable reinforcement learning, reinforcement learning, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

Learning Reward Machines for Partially Observable Reinforcement Learning

Icarte, Rodrigo Toro, Waldie, Ethan, Klassen, Toryn, Valenzano, Rick, Castro, Margarita, McIlraith, Sheila

Neural Information Processing SystemsMar-19-2020, 03:03:29 GMT

Reward Machines (RMs), originally proposed for specifying problems in Reinforcement Learning (RL), provide a structured, automata-based representation of a reward function that allows an agent to decompose problems into subproblems that can be efficiently learned using off-policy learning. Here we show that RMs can be learned from experience, instead of being specified by the user, and that the resulting problem decomposition can be used to effectively solve partially observable RL problems. We pose the task of learning RMs as a discrete optimization problem where the objective is to find an RM that decomposes the problem into a set of subproblems such that the combination of their optimal memoryless policies is an optimal policy for the original problem. We show the effectiveness of this approach on three partially observable domains, where it significantly outperforms A3C, PPO, and ACER, and discuss its advantages, limitations, and broader potential. Papers published at the Neural Information Processing Systems Conference.

learning reward machine, observable reinforcement learning, reinforcement learning, (1 more...)

Neural Information Processing Systems

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.66)

Add feedback

Collaborating Authors

observable reinforcement learning

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

Learning Reward Machines for Partially Observable Reinforcement Learning

Provable Partially Observable Reinforcement Learning with Privileged Information

Partially Observable Reinforcement Learning with Memory Traces

Reviews: Learning Reward Machines for Partially Observable Reinforcement Learning

Learning Reward Machines for Partially Observable Reinforcement Learning

Learning Reward Machines for Partially Observable Reinforcement Learning